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exhibits general temporal correlation characterized by its spectral density function. We consider both 
discrete-time and continuous-time channels, and find their asymptotics at low signal-to-noise ratio (SNR). 
Compared to known capacity upper bounds under peak constraints, these asymptotics usually lead to 
negligible rate loss in the low-SNR regime for slowly time-varying fading channels. We further specialize 
to case studies of Gauss-Markov and Clarke's fading models. 
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For Rayleigh fading channels without channel state information (CSI) at low signal-to-noise 
ratio (SNR), the capacity-achieving input gradually tends to bursts of "on" intervals sporadically 
inserted into the "off" background, even under vanishing peak power constraints [1]. This highly 
unbalanced input usually imposes implementation challenges. For example, it is difficult to 
maintain carrier frequency and symbol timing during the long "off" periods. Furthermore, the 
unbalanced input is incompatible with linear codes, unless appropriate symbol mapping {e.g., 
M-ary orthogonal modulation with appropriately chosen constellation size M) is employed to 
match the input distribution. 
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This paper investigates the achievable information rate of phase-shift keying (PSK). PSK is 
appealing because it has constant envelope and is amenable to linear codes without additional 
symbol mappings. Focusing on low signal-to-noise ratio (SNR) asymptotics, we utilize a recursive 
training scheme to convert the original fading channel without CSI into a series of parallel 
sub-channels, each with estimated CSI but additional noise that remains circular complex white 
Gaussian. The central results in this paper are as follows. First, for a discrete-time channel whose 
unit-variance fading process {hd[A;] : — oo < k < 00} has a spectral density function Sii^{e^^) 
for -n <n <7i, the achievable rate is (1/2) ■ Ul/2n) ■ J^^ Sl^{e^^)dn - l] • + o(p2) nats 
per symbol, as the average channel SNR p ^ 0. This achievable rate is at most (1/2) • p^ + 
o(p^) away from the channel capacity under peak SNR constraint p. Second, for a continuous- 
time channel whose unit-variance fading process {hc(t) : —00 < t < 00} has a spectral density 
function Sh^{juj) for —00 < u < 00, the achievable rate as the input symbol duration T ^ is 
1 — (l/27rP) ■ log (1 + P ■ ShSj^)) doj -P nats per unit time, where P > is the envelope 
power. This achievable rate exactly coincides with the channel capacity under peak envelope P. 

We further apply the above results to specific case studies of Gauss-Markov fading models (both 
discrete-time and continuous-time) as well as a continuous-time Clarke's fading model. For 
discrete-time Gauss-Markov fading processes with innovation rate e ^ 1, the quadratic behavior 
of the achievable rate becomes dominant only for p ^ e. Our results, combined with previous 
results for the high-SNR asymptotics, suggest that coherent communication can essentially be 
realized for e < p < 1/e. For Clarke's model, we find that the achievable rate scales sub-linearly, 
but super-quadratically, as O (log(l/P) • P^) nats per unit time as P 0. 

The remainder of this paper is organized as follows. Section |ll| describes the channel model 
and the recursive training scheme. Section Hill deals with the discrete-time channel model, and 
Section |lVl the continuous-time channel model. Finally Section |V| provides some concluding 
remarks. Throughout the paper, random variables are in bold font. All the logarithms are to base 
e, and information units measured in nats. 
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II. Channel Model, Recursive Training Scheme, and Effective SNR 

We consider a scalar time- selective, frequency non- selective Rayleigh fading channel, written in 
baseband-equivalent continuous-time form as 

x(t) = hc(t) ■ s{t) + z{t), for - cx) < t < cx), (1) 

where s(t) E C and x(t) E C denote the channel input and the corresponding output at time 
instant t, respectively. The additive noise {z(t) : — oo < t < 00} is modeled as a zero-mean 
circular complex Gaussian white noise process with £{z{s)z'^{t)} = 5{s — t). The fading process 
{hc(t) : —00 < t < 00} is modeled as a wide-sense stationary and ergodic zero-mean circular 
complex Gaussian process with unit variance £^{hc(t)hj(t)} = 1 and with spectral density 
function Sh^{ju) for —00 < cu < 00. Additionally, we impose a technical condition that {hc(t) : 
— 00 < t < 00} is mean-square continuous, so that its autocorrelation function Kh^^r) = 
£{hc{t + r)hj(t)} is continuous for r E {—00, 00). 

Throughout the paper, we restrict our attention to PSK over the continuous-time channel 
For technical convenience, we let the channel input s{t) have constant envelope P > and 
piecewise constant phase, i.e., 

s{t) = s[k] = y/P ■ e^^^''\ iikT<t<{k + 1)T, 

for — 00 < A; < 00'. The symbol duration T > is determined by the reciprocal of the channel 
input bandwidth^. 

Applying the above channel input to the continuous -time channel and processing the channel 
output through a matched filter^, we obtain a discrete-time channel 

= ^/p ■ hd[fc] ■ s[k] + z[k], for — 00 < k < 00. (2) 

'Here we note a slight abuse of notation in this paper, that a symbol (e.g., s) can be either continuous-time or discrete-time. 
The two cases are distinguished by -{t) for continuous-time and -[k] for discrete-time. 

^For multipath fading channels, T should also be substantially greater than the delay spread [2], otherwise the frequency 
non-selective channel model Q may not be valid. Throughout the paper we assume that this requirement is met. 

'a matched filter suffers no information loss for white Gaussian channels [3]. For the fading channel Q, it is no longer 
optimal in general [4]. However, in this paper we still focus on the matched filter, which is common in most practical systems. 
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The channel equations dU) and Q are related through 

1 r{k+i)T 
x\k] = —= / x(t)dt 

1 /•{fc+l)T 



1 /•{fc+l)T 

hdM = , ^ ^ / K{t)dt. 

j^K^Xs-t)dsdtJ^T 

For the discrete-time channel Q we can verify that 

• The additive noise {z[A;] : — oo < k < oo} is circular complex Gaussian with zero mean 
and unit variance, i.e., z[k] ~ CA/'(0, 1), and is independent, identically distributed (i.i.d.) 
for different k. 

• The fading process {hd[/i;] : — oo < k < oo} is wide-sense stationary and ergodic zero-mean 
circular complex Gaussian, with hd[k] being marginally CJ\f{0, 1). We further notice that 
{hd[fc] : — oo < k < oo} is obtained through sampling the output of the matched filter, 
hence its spectral density function is 

^j^S^K^Xs-t)dsdt k=-oo \ T J 

for — TT < ^7 < TT. 

• The channel input {s[k] : — oo < k < oo} is always on the unit circle. In the sequel, we 
will further restrict it to be complex proper [5], i.e., S{s'^[k]} = [S{s[k]}]^. The simplest 
such input is quadrature phase-shift keying (QPSK); by contrast, binary phase-shift keying 
(BPSK) is not complex proper. 

• The average channel SNR is given by 

P = I Ki^^is ~ t)dsdt\ > 0. (3) 

Throughout the paper, we assume that the realization of the fading process {hc{t) : — oo < t < 
oo} is not directly available to the transmitter or the receiver, but its statistical characterization 
in terms of Sh^{juj) is precisely known at both the transmitter and the receiver. 

We employ a recursive training scheme to communicate over the discrete-time channel Q. By 
interleaving the transmitted symbols as illustrated in Figure d (cf. [6]), the recursive training 
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Fig. 1. Illustration of the interleaving scheme. Input symbols are encoded/decoded column- wise, and transmitted/received 
row-wise. 

scheme effectively converts the original non-coherent channel into a series of parallel sub- 
channels, each with estimated receive CSI but additional noise that remains i.i.d. circular complex 
Gaussian. The interleaving scheme decomposes the channel into L parallel sub-channels (PSC). 
The /th (/ = 0, 1, . . . , L - 1) PSC sees all the inputs s[A; ■ L + /] of © for A; = 0, 1, . . . , /sT - 1. 
These L PSCs suffer correlated fading, and this correlation is exactly what we seek to exploit 
using recursive training. Although some residual correlation remains within each PSC among its 
K symbols, due to the ergodicity of the channel Q, this correlation vanishes as the interleaving 
depth L ^ oo. In practical systems with finite L, if necessary, we may utilize an additional 
interleaver for each PSC to make it essentially memoryless. 

We make a slight abuse of notation in the sequel. Since all the PSCs are viewed as memoryless, 
when describing a PSC we can simply suppress the internal index k among its K coding 
symbols, and only indicate the PSC index / without loss of generality. For example, hd[/] actually 
corresponds to any hd[k ■ L + /], for /c = 0, 1, . . . , if — 1. 

The recursive training scheme performs channel estimation and demodulation/decoding in an 
alternating manner. To initialize transmission, PSC 0, the first parallel sub-channel, transmits 
pilots rather than information symbols to the receiver. Based upon the received pilots in PSC 
0, the receiver predicts hd[l], the fading coefficient of PSC 1, and proceeds to demodulate and 
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decode the transmitted symbols in PSC 1 coherently. If the rate of PSC 1 does not exceed 
the corresponding channel mutual information, then information theory ensures that, as the 
coding block length K ^ oo, there always exist codes that have arbitrarily small decoding 
error probability. Hence the receiver can, at least in principle, form an error-free reconstruction 
of the transmitted symbols in PSC 1, which then effectively become "fresh" pilots to facilitate 
the prediction of hd[2] and subsequent coherent demodulation/decoding of PSC 2. Alternating 
the estimation-demodulation/decoding procedure repeatedly, all the PSCs are reliably decoded 
one after another. 

Remark: A major drawback of the recursive training scheme is that its interleaved structure 
typically leads to a large delay. The coding block length K should be large enough such that the 
decoding error probability is small enough to prevent catastrophic error propagation along the 
PSCs. Furthermore, the number of PSCs L should also be large enough such that the prediction 
of the fading process essentially converges to its steady-state limit. Only after receiving all the 
K ■ L symbols in the interleaved block can the receiver perform the alternating estimation- 
demodulation/decoding procedure. However, we note that this may not be the case for wideband 
channels. In wideband channels with frequency-decorrelated fading processes, we can employ 
multi-carrier techniques, e.g., orthogonal frequency-division multiplexing (OFDM), to decompose 
the original wide bandwidth into a large number of sub-bands, suffering essentially independent 
frequency non- selective fading processes. In this case, each row in Figure [T] corresponds to 
a sub-band, and the coding block length {i.e., the number of sub-bands) K increases as the 
bandwidth grows. For each PSC, its K coding symbols occur simultaneously in physical time, 
hence the receiver need not wait until receiving all the K ■ L symbols to perform the alternating 
estimation-demodulation/decoding procedure. 

By induction, let us consider PSC /, assuming that the inputs {s[z] : z = 0, 1, . . . , / — 1} of the 
previous PSCs have all been successfully reconstructed at the receiver. Since the channel inputs 
are always on the unit circle, the receiver can compensate for their phases in the channel outputs, 
and the resulting observations become 

e-^'^['l ■ x[i] = ■ hd[z] + £-■'■^['1 ■ z[{\ for i = 0, 1, 1. 

x'[j] z'[i] 

Since zero-mean circular complex Gaussian distributions are invariant under rotation, the rotated 
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noise z'[z] is still i.i.d. zero-mean unit- variance circular complex Gaussian. Then we can utilize 
standard linear prediction theory (e.g., [7]) to obtain the one-step minimum mean-square error 
(MMSE) prediction of hd[/] defined as 

hd[;] = S {hd[/] I {x'[z] : ^ = 0, 1, . . . , / - 1}} . (4) 

The estimate hd[/] and the estimation error hd[l] = hd[/] — hd[^] are jointly circular complex 
Gaussian distributed as CM (0, 1 — cr^[/]) and CJ\f (0, respectively, and are uncorrelated and 

further independent. Here cr^[/] denotes the mean-square prediction error. The channel equation 
of PSC / can then be written as 

x[/] = v^-hd[/]-s[/]+z[/] 

= v^-hd[/]-s[/] + v/p-hd[/]-s[/]+z[/], (5) 

where the effective noise z[l] is circular complex Gaussian, and is independent of both the 
channel input s[/] and the estimated fading coefficient hd[^]- Thus, the channel (jSj) becomes a 
coherent Gaussian channel with fading and receive CSI hd[f'], with effective SNR 

p|,| = ^^.,. (6) 

In the paper we mainly focus on the ultimate performance limit without delay constraints, which 
is achieved as the interleaving depth L ^ oo. Under mild technical conditions, the one-step 
MMSE prediction error sequence {cr^[/] : / = 0, 1, . . .} converges to the limit [8, Chap. XII, Sec. 
4] 



Um = - ■ jexp |i- I'^^log (l + p ■ 5h,(e^'^)) dVt^ - l} . (7) 



2 def 



Consequently the effective SNR Q sequence {p[/] : / = 0, 1, . . .} converges to 



lim p[l] = ; ■ p. (8) 

We are mainly interested in evaluating the mutual information of the induced channel © at the 
limiting effective SNR poo as the actual channel SNR p ^ 0. This low-SNR channel analysis 
is facilitated by the explicit second-order expansion formulas of the channel mutual information 
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at low SNR [9]. Applying [9, Theorem 3] to the induced channel"* © at pooj we have 



R= lim j(s[Z];x[/] |hd[/ 



poo- pIo + o{p^) as p ^ 0. 



(9) 



III. Asymptotic Channel Mutual Information at Low SNR 

As shown in ®, the asymptotic channel mutual information depends on the limiting effective 
SNR (El), which further relates to the limiting one-step MMSE prediction error dTJ). The following 
theorem evaluates the asymptotic behavior of the channel mutual information 

Theorem 3.1: For the discrete-time channel (|2l), as p ^ 0, its induced channel Q achieves the 
rate 



R 



1 

2^ 



p' + o{p% 



(10) 



if the integral (l/27r) • /^^ Si {e^^)dQ exists. 



Proof: We will prove that 



2tt 



p + o(p), 



(11) 



which together with (EJ) leads to 



Po 



p' + oip'). 



Then JTOb immediately follows from 

For simplicity let us denote by g{p) the integral (l/27r) ■ /_!^^log (l + p ■ Sii^{e^^)^ dTt, hence 

limo(p) = ^ r \ogldn = Q 

P-O dp 271 J ^ 



p^o d^p 



Note that [9, Theorem 3] is only applicable to complex proper channel inputs, as we have assumed in the channel model. 
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To prove dTTT) . we apply I'Hospital's rule in © to evaluate 




(p) dg^p) 



dg\p) e3(^) dg{p) 



d?p 2p dp 



Substituting the above quantities into the first-order Taylor expansion of a^, we then obtain 
(HB Q.E.D. 

Theorem 13 . 1 1 states that for PSK at low SNR, the achievable channel mutual information vanishes 
quadratically with SNR. This is consistent with [10] [11]. Furthermore, it is of particular interest 
to compare the asymptotic expansion dlOb with several previous results. 

A. Comparison with a Capacity Upper Bound 

For the discrete-time channel 0, PSK with constant SNR p is a particular peak-limited channel 
input. The capacity per unit energy of channel Q under a peak SNR constraint p is [1] 



achieved by on-off keying (OOK) in which each "on" or "off" symbol corresponds to an infinite 
number of channel uses, and the probability of choosing "on" symbols vanishes. Such "bursty" 
channel inputs are in sharp contrast to PSK. From ([T2t . an upper bound to the channel capacity 
can be derived as [1] 



Comparing dlOb and ([T3t . we notice that the penalty of using PSK instead of the bursty capacity- 
achieving channel input is at most (1/2) • p^ + o(p^). For fast time-varying fading processes. 




(12) 




(13) 
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this penalty can be relatively significant. For instance, if the fading process is memoryless, i.e., 
S^^{e^^) = 1 for -TT < < TT, then {1/2-k) ■ Sl^{e^^)dVl -1 = 0, implying that no 
information can be transmitted using PSK over a memoryless fading channel. Fortunately, for 
slowly time-varying fading processes, the integral (l/27r) ■ j^T, S'^^{e^^)dVL is typically much 
greater than 1, as we will illustrate in the sequel. 



B. Comparison with the High-SNR Channel Behavior 

From (fTOb and ([T3t . it can be said that {l/2n) ■ J"^^ S^^{e^^)d^l is a fundamental quantity 
associated with a fading process at low SNR. This is in contrast to the high-SNR regime, where 
a fundamental quantity is [12] 

2 def 



'^pred 



The quantity cTp^.^^ is the one-step MMSE prediction error of hd[0] given its entire noiseless past 
{hd[— 1], hd[— 2], . . .}. When cr'^^^^ > the process is said to be regular; and when a'^^^^ = it is 
said to be deterministic, that is, the entire future {hd[0], hd[l], . . .} can be exactly reconstructed 
(in the mean-square sense) by linearly combining the entire past {hd[— 1], hd[— 2], . . .}. It has 
been established in [12] [13] that, for regular fading processes, 

C = loglogp - 1 - 7 + log \- o{l) as p ^ oo, (14) 

^red 

where 7 = 0.5772 ... is Euler's constant, and for deterministic fading processes, 

(ei^) = 0}) asp ^00, (15) 
log p 27r ^ J ^ 

where p(-) denotes the Lebesgue measure on the interval [— 7r,7r]. 

It is then an interesting issue to investigate the connection between {l/2n) ■ J^^ S'^^{e^^)d^l and 
(jpj,gd- However, as the following two examples reveal, there is no explicit relationship between 
these two quantities. 

1 ) Example 1 : Even a deterministic fading process can lead to poor low-SNR performance: 
Consider the following class of spectral density functions Sh^{e^^) as illustrated in Figure El 

, ^ if |^]| < TT - ^ 

if TT - ^ < < TT 
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n/(n - 1) 



-n-n + n/n 



n - jt/n Jt 



Fig. 2. Spectral density function of a deterministic fading process that leads to poor low-SNR performance. The narrow 
notches on the spectrum make the process deterministic, while the remaining almost unit spectrum makes it behave as if nearly 
memoryless in the low-SNR regime for large n. 



Since Sh^{e^^) = for certain intervals with non-zero measure, the corresponding fading process 
is deterministic with a"^,.^^ = [8]. However, this class of Sha{e^^) leads to 



n 



1 as n — i> oo, 



2ti J-^ ' n-1 
resulting in vanishing values of the quadratic coefficient in dlOb . 



2) Example 2: Even an almost memoryless fading process can lead to good low-SNR perfor- 
mance: Consider the following class of spectral density functions S\i^{e^^) as illustrated in 
Figure |3l 



n 



if \VL\ < 



if ^ < Ifil < TT 



n = 2,3, 



For large n the fading process becomes almost memoryless since 



2 jlogn 

f^pred = exp <; + log 



n — 1 



'I - 



1 



n\/n 



However, this class of S\i^{e^^) also leads to 



1 as n — > oo. 



n — 1 



oo 



as n — > oo. 



February 1, 2008 



DRAFT 



12 



J3 



(n^'^-1)V^-1/n) 



^ £1 = 7t/(n' 



Fig. 3. Spectral density function of an almost memoryless fading process that leads to good low-SNR performance. The almost 
unit spectrum makes the process nearly memoryless, while the narrow impulse-like spectrum peak significantly contributes to 
the integral (l/27r) • J^^ S^^{e-'^)(Kl, leading to good low-SNR performance for large n. 



C. Case Study: Discrete-Time Gauss-Markov Fading Processes 



In this subsection, we apply Theorem 13.11 to analyze a specific class of discrete-time fading 
processes, namely, the discrete-time Gauss-Markov fading processes. The fading process in the 
channel model can be described by a first-order auto-regressive (AR) evolution equation of the 
form 

hd[A; + l] = v/r^-hd[A;] + v^-v[A; + l], (16) 

where the innovation sequence {v[A;] : — oo < k < oo} consists of i.i.d. CJ\f{0, 1) random 
variables, and v[A; + 1] is independent of {hd[?] : — c>o < i < k}. The innovation rate e satisfies 
< e < 1. 

The spectral density function Sh^{e^^) for such a process is 

^h,(e^'^) = -^ , ^, -n<n<TT. (17) 

(2 — e) — 2vl ~ e ■ cosiZ 



Hence 



^ r sUe^^)dn = ^r ^ -,dn = ^ - 1. 
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Applying Theorem 13.11 we find that for the discrete-time Gauss-Markov fading model, 

R= (^^-1^ ■P^ + o{p^) as p ^ 0. (18) 

For practical systems in which the fading processes are underspread [2], the innovation rate e 
typically ranges from 1.8 x 10"^ to 3 x 10"'^ [14]. So the (1/2) ■ + o(p^) rate penalty of PSK 
with respect to optimal, peak-limited signaling may well be essentially negligible at low SNR. 

Due to the simplicity of the discrete-time Gauss-Markov fading model, we are able to carry out 
a non-asymptotic analysis to gain more insight. Applying (flTb to dTJ), the steady-state limiting 
channel prediction error is 



„ (p - 1) ■ e + J(p - 1)2 . e2 + 4pe 

= . (19) 

Further applying ( [T9b to (El), we can identify the following three qualitatively distinct operating 
regimes of the induced channel ^ for small e <^ 1: 

• The quadratic regime: For p ^ e, ^ 1 — p/e, Poo ~ P^/e; 

• The linear regime: For e<p<l/e, cr^~ \A7p' Poo ~ p; 

• The saturation regime: For 1/e ^ p, ^ e, poo ~ 1/e. 

Figure m illustrates these three regimes for e = 10"^. The different slopes of poo on the log-log 
plot are clearly visible for the three regimes. The linear regime covers roughly 80 dB, from —40 
dB to +40 dB, in this particular example. 

An interesting observation is that the two SNR thresholds dividing the three regimes are deter- 
mined by a single parameter e, which happens to be the one-step MMSE prediction error a"^^^^ 
for the discrete-time Gauss-Markov fading process. The 1/e threshold dividing the linear and the 
saturation regimes coincides with that obtained in [14], where it is obtained for circular complex 
Gaussian inputs with nearest-neighbor decoding. In this paper we investigate PSK, which results 
in a penalty in the achievable rate at high SNR. More specifically, it can be shown that the 
achievable rate for p behaves like (1/2) ■ logmin{p, 1/e} + 0(1) [15]. 

A further observation relevant to low-SNR system design is that, the e threshold dividing the 
quadratic and the linear regimes clearly indicates when the low-SNR asymptotic channel behavior 
becomes dominant. Since the innovation rate e for underspread fading processes is typically small, 
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p(dB) 

Fig. 4. Case study of the discrete-time Gauss-Markov fading model: Illustration of the three operating regimes, e = 10~*. 

we essentially have a low-SNR channel with perfect receive CSI above p = e. This suggests 
that there may be an "optimal" SNR at which the low-SNR capacity limit is the most closely 
approached. Figure |5l plots the normalized achievable rate R/ p vs. SNR, in which the achievable 
rate R is numerically evaluated for the induced channel ^ using QPSK. Although all the curves 
vanish rapidly below the threshold p = e, for certain p > e, the normalized achievable rate can 
be reasonably close to 1. For example, taking e = 10~^, the "optimal" SNR is p ~ —15 dB, and 
the corresponding normalized achievable rate is above 0.9, i.e., more than 90% of the low-SNR 
capacity limit is achieved. 

IV. Filling the Gap to Capacity by Widening the Input Bandwidth 

In Section |lll| we have investigated the achievable information rate of the discrete-time channel 
Q, which is obtained from the continuous-time channel dU) as described in Section |lll The 
symbol duration T there is a fixed system parameter In this section we will show that, if we are 
allowed to reduce T, i.e., widen the input bandwidth, then the recursive training scheme using 
PSK achieves an information rate that is asymptotically consistent with the channel capacity 
under peak envelope P. More specifically, we have the following theorem. 

Theorem 4.1: For the continuous-time channel ([T]) with envelope P > 0, as the symbol duration 
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Fig. 5. Normalized rate R/ p vs. SNR for recursive training with QPSK on the discrete-time Gauss-Markov fading channel. As 
a comparison, the dashed-dot curve is the channel capacity (normalized by SNR) with perfect receive CSI, achieved by circular 
complex Gaussian inputs. 



T — > 0, its induced channel ^ achieves 

1 1 



lim — 

T^O T 



I---— r \og{l + P-S^Xj^))duj 

r ZTT J-oo 



p. 



(20) 



Proof: In Section |ll| we have noted that the spectral density function Sh^{e^^) of the discrete-time 
fading process is related to Sh^j^) through 



sinc^(r2 — 2k7c), 



'/q /q K\iXs — t)dsdt fc=-oo ^ 
and that the SNR of the discrete-time channel @ is given by 

For the proof, the following two identities are useful: 

lim - ■ - / log (l + p • ^h,(e^'^)) dQ = - log (1 + P • S^^ju)) du. (22) 

The second identity (1^ has been established in [1, Claims 8.1 and 8.2]. To prove the first 
one (OTT) . note that for the mean-square continuous fading process {hc(t) : — oo < t < oo}, its 



(21) 
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autocorrelation function Kh^^r) = £{hc{t + T)hl(t)} is continuous for all — oo < r < oo. Hence 
for any T > 0, there exists T* e [0, T] such that 

Jo Jo 

KhM-T^ = T^ asT^O. 
Now substituting (1211) and (l22t into (Q, we have 

^lo = ^■{exp|i-£log(l + p-^h,(e^'^))rf^^}-l} 

= ^ ■ {exp log (1 + P ■ 5h.(j^)) rfo; ■ T + o(T)} - 1 

/oo 
log(l + P-5h, (jo;)) rfa;-T + o(T) 

/oo 
log(l + P-5he(j^))c?^ + o(l) asT^O. 
-oo 



p 



\ /-oo 

1 1 

P ' 2^ J-oo 



That is, 



1 1 1"°° 



Then substituting (1211) and (l23t into (EJ), we have 



lim — 

T^O T 



lim 



T^o (a^.p + l).T 
1 - ■ ^ /_ Jog (1 + P • S^Sj^)) du 



■ P. 



Finally Theorem |0] immediately follows from substituting (l24ll into Q. Q.E.D. 



(23) 



(24) 



Again we compare the asymptotic achievable rate (l20b to a capacity upper bound based upon 
the capacity per unit energy. For the continuous-time channel ([T]), the capacity per unit energy 
under a peak envelope constraint P > is [1] 

1 



C 



log(l + P-Sh.(jcu))t/u;, 



27rP ^-oo 

and the related capacity upper bound (measured per unit time) is [1] 



C < U{P) 



dcf 



1 /■°° 
ZnJr^ J-oo 



■ p. 



(25) 



(26) 



27rP ^-oo 

Comparing (I^Ut and (E^ . it is surprising to notice that these two quantities coincide. Recalling 
that in Section |llll we have noticed a (1/2) ■ + o(p^) rate penalty in discrete-time channels, 



February 1, 2008 



DRAFT 



17 



we conclude that widening the input bandwidth eliminates this penalty and essentially results in 
an asymptotically capacity-achieving scheme in the wideband regime^. 

The channel capacity of continuous-time peak-limited wideband fading channels (l20b was origi- 
nally obtained in [16]. However, in [16] the capacity is achieved by frequency-shift keying (FSK), 
which is bursty in frequency. In our Theorem 14.11 we show that the capacity is also achievable 
if we employ recursive training and PSK, which is bursty in neither time nor frequency. 

After some manipulations of (l20b . we further have that 
. As P 0, 



liuiT^oiR/T) 1 1 
P2 ' 2 ' 2^ 

if the above integral exists. 

• As P ^ oo, 

liuiT^oiR/T) 



CO 



1. (28) 



P 

In the sequel we will see that (l27t and (l28t are useful for asymptotic analysis. 



A. An Intuitive Explanation of Theorem \4.1\ 

In our proof of Theorem 14.11 we have utilized identities (OTT) and (E^ to conveniently relate the 
continuous-time channel ([T]) to the discrete-time channel Q. However, these identities also have 
concealed much of the intuition contained in the derivation. To further illustrate the underlying 
mechanism in Theorem 14.11 here we give an alternative argument. Although the following 
reasoning is not mathematically rigorous, it does provide an intuitive way to understand the 
channel behavior as the symbol duration T ^ 0. 

In Section im we have described the conversion from the continuous-time channel ^ to the 
discrete-time channel Q. Strictly speaking, the discrete-time fading coefficient h.^[k] is the fcth 
sample of the matched-filtered, continuous-time fading process. The matched-filtering effect can 

^Again, the same caveat as in footnote 2 applies. 
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be viewed as averaging hc(t) within a symbol interval of length T. Since we have assumed 
that the continuous -time fading process {hc(t) : — oo < t < 00} is mean-square continuous in 
t, roughly speaking, as T ^ 0, the discrete-time fading coefficient hd[A;] ^ hc{kT), and the 
SNR per symbol p ^ P ■ T. Furthermore, compared to sufficiently small T, the fading process 
{hc{t) : —00 < t < 00} can be viewed as essentially band-limited. So the discrete-time fading 
process {hd[fc] : —00 < k < 00} is approximately the sampled continuous-time fading process 
{hc(t) : —00 < t < 00} with sampling rate well beyond its Nyquist rate, and we may write 
5h,(e^'^) ^ (l/T) ■ S^^{jQ/T) for -7i<Q<7r. 

Now let us apply the above approximations to © to evaluate cr^ for small T: 



2 



P-T [ ' [2n J^n V "cwy. 



P 







exp<j 


.27r 
f 1 


exp ■! 


[2n 


exp <j 


.27r 



P- 

^ -L r log (1 + P . dco-T=^-^ r log{l + P- Si^^jco)) dcu, 

Ztx J~oo r Ztt J^oo 



P-T 

which is the same as (l23t in our proof 



B. Case Study: The Continuous -Time Gauss-Markov Fading Model 



In this subsection, we apply Theorem 14. II to analyze the continuous-time Gauss-Markov fading 
processes. Such a process has autocorrelation function 

^h.(r) = (l-6c)H/2, 

where the parameter < ec < 1 characterizes the channel variation, analogously to e for the 
discrete-time case in Section |III| The spectral density function of the process is 

|log(l-ec)| 



w2 + (log(l-e,))V4' 
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P (dB) 



Fig. 6. The asymptotic achievable rate limT^o(^/r) vs. the envelope P, for recursive training with complex proper PSK on 
the continuous-time Gauss-Markov fading channel with innovation rate — 0.9. The dashed-dot curves indicate the limiting 
behaviors for small and large P. 



Applying Theorem 14.11 we find that the recursive training scheme using PSK with a wide 
bandwidth asymptotically achieves an information rate 



lim^ - ^-^^^^^-h 1 + ^^-1 (29) 
T^oT 2 |log(l-ec)| ; 

P^ + o{P^) asP-^0. (30) 



|log(l - ecj\ 

Figure 1^ illustrates the achievable rate (1^ vs. P for ec = 0.9. 



C. Case Study: Clarke 's Fading Model 

In this subsection, we apply Theorem 14.11 to analyze Clarke's fading processes. Such a process 
is usually characterized by its spectral density function [17] 

{— ■ , ^ =, if |c<j| < UJm 
0, otherwise, 
where Um is the maximum Doppler frequency. 
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-40 -30 -20 -10 10 20 30 40 

P (dB) 



Fig. 7. The asymptotic achievable rate limT^o{R/T) vs. the envelope P, for recursive training with complex proper PSK on 
Clarke's fading channel with maximum Doppler frequency LUm. = 100. The dashed-dot curves indicate the limiting behaviors 
for small and large P. 



Applying Theorem 14.11 we find that 
lim — = 

T^O T 



L+v/l-(2P/a;„)2 



2P 



(31) 



^ ■ {log ^ + ^{2P/uj^f - 1 ■ arctan ^{2P/uJmY " l} , if ^ > 
For large P, the asymptotic behavior of (IbTT) is consistent with (l28t . For small P, however, the 
integral in (l27b diverges, hence the asymptotic behavior of (IBTT) scales super-quadratically with 
P. After some manipulations of (ISTT) . we find that 

2 , 1 



lim — 

T^O T 



log -■P2 + 0(P2) 



Figure IT] illustrates the achievable rate dSTT) vs. P for cj^ 
the asymptotic expansion (l32b is accurate. 



as P ^ 0. (32) 
= 100. We notice that, for small P, 



V. Concluding Remarks 



For fading channels that exhibit temporal correlation, a key to enhancing communication per- 
formance is efficiently exploiting the implicit CSI embedded in the fading processes. From the 
preceding developments in this paper, we see that a recursive training scheme, which performs 
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channel estimation and demodulation/decoding in an alternating manner, accomplishes this job 
reasonably well, especially when the channel fading varies slowly. The main idea of recursive 
training is to repeatedly use decisions of previous information symbols as pilots, and to ensure 
the reliability of these decisions by coding over sufficiently long blocks. 

Throughout this paper, we restrict the channel inputs to complex proper PSK, which is not 
optimal in general for Rayleigh fading channels without CSI. There are two main motivations 
for this choice. First, compared to other channel inputs such as circular complex Gaussian, 
PSK leads to a significant simplification of the analytical developments. As we saw, recursive 
training with PSK converts the original fading channel without CSI into a series of parallel sub- 
channels, each with estimated receive CSI but additional noise that remains circular complex 
white Gaussian. In this paper we mainly investigate the steady-state limiting channel behavior; 
however, it may worth mentioning that, using the induced channel model presented in Section HH 
exact evaluation of the transient channel behavior is straightforward, with the aid of numerical 
methods. 

Second, PSK inputs perform reasonably well in the moderate to low SNR regime. This is due to 
the fact that, for fading channels with perfect receive CSI, as SNR vanishes, channel capacity can 
be asymptotically achieved by rather general complex proper inputs besides circular complex 
Gaussian [9]. The main contribution of our work is that it clearly separates the effect of an 
input peak-power constraint and the effect of replacing optimal peak- limited inputs with PSK, 
which is non-bursty in both time and frequency. It is shown that, for slowly time-varying fading 
processes, the rate loss from PSK inputs is essentially negligible. Furthermore, as revealed by 
the non- asymptotic analysis for discrete-time Gauss-Markov fading processes, there appear to 
be non- vanishing SNRs at which near-coherent performance is attainable with recursive training 
and PSK. 
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